-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Response Ops] Event log aggregations #126948
Conversation
…g/event-log-aggregate
…g/event-log-aggregate
…g/event-log-aggregate
Pinging @elastic/response-ops (Team:ResponseOps) |
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Left a nit comment about the aggs query result property name (names!).
Thanks for pushing this over the line!!!!
The biggest benefactor of this will hopefully be the get alert status code, that currently does an in-JS aggregation. Will be a good test of this API, as well as if we structured the event log docs good enough to be able to get that data via aggs ...
…g/event-log-aggregate
…o1/kibana into alerting/event-log-aggregate
@elasticmachine merge upstream |
@elasticmachine merge upstream |
…g/event-log-aggregate
…o1/kibana into alerting/event-log-aggregate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked out, rebased the in-memory aggs POC (PR) against these changes and started testing aggs queries. Still finalizing the query we'd use on the security side, but I don't see anything that would need to change with how aggs were exposed from the event-log client. LGTM! Thanks for making this happen for 8.2 @ymao1! 🙂 🙌 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…g/event-log-aggregate
💚 Build SucceededMetrics [docs]Public APIs missing comments
Public APIs missing exports
Unknown metric groupsAPI count
ESLint disabled line counts
Total ESLint disabled count
History
To update your PR or re-run it, just comment with: cc @ymao1 |
## Summary Resolves #119598, #119599, #101014 Test plan ([internal doc](https://docs.google.com/document/d/1-prIUGYaPHiwGA79CgSdw1926lxIPKGWWkYOUD2BM1U/edit#heading=h.womzsfdt6zt8)) Adds `Rule Execution Log` table to Rule Details page: <p align="center"> <img width="700" src="https://user-images.githubusercontent.com/2946766/158540840-e9cddb9b-f33d-4b95-86ad-cb3e0a00cf39.gif" /> </p> ### Implementation notes The useful metrics within `event-log` for a given rule execution are spread between a few different platform (`execute-start`, `execute`) and security (`execution-metrics`, `status-change`) events. In effort to provide consolidated metrics per rule execution (and avoiding a lot of empty cells and mis-matched statuses like in the image below) <p align="center"> <img width="700" src="https://user-images.githubusercontent.com/2946766/151933881-2e58f4d7-4cda-4528-9d44-37cb7bd5de9c.png" /> </p> these rule execution events are aggregated by their `executionId`, and then fields are merged from each different event. This PR was re-worked to take advantage of the new event-log aggregation support added in #126948, and is no longer implemented as an in-memory aggregation server side. * Due to restrictions around supplying search filters that may match multiple sub-agg buckets and missing data ([see discussion here](https://github.com/elastic/kibana/pull/127339/files#r825240516)), it was decided that we'd disable the search bar for the time being. We have both a near-term (writing single rollup event) and long-term (ES|QL) solution that will allow us to re-enable this functionality. * Note, since a `terms` agg is used to fetch all execution events, an upper bound must be set. See [this discussion](https://github.com/elastic/kibana/pull/127339/files#r823035420) for more details, but setting this max to `1000` events for the time being, and returning total cardinality of execution events back within `total` to allow the UI to inform the user that they should narrow their search further to better isolate and find possible issues. This should be a be a reasonable constraint for most all rules as a rule executing every 5 minutes, 1000 executions would cover over 3 days of execution time. <p align="center"> <img width="700" src="https://user-images.githubusercontent.com/2946766/159045563-966896b4-3cd1-475d-9f0e-c2d300683546.png" /> </p> The `Filter for alerts` action will be available on all `Succeeded`/`Partial Failure` executions even if there weren't alerts generated until #126210 is merged and we can start returning the alert count, at which point we can programmatically enabled/disable this action based on alert count. <p align="center"> <img width="300" src="https://user-images.githubusercontent.com/2946766/159051762-e2f97ba4-4ce1-4f67-8ae1-395e4b191cab.png" /> </p>
Towards #125645
Summary
This PR will be used to support adding an execution log API to the rules client but nothing uses the event log aggregation yet in this PR.
Checklist